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Abstract 

Computing the state-space topology of a dynamical system from scalar data 
requires accurate reconstruction of those dynamics and construction of an appro¬ 
priate simplicial complex from the results. The reconstruction process involves 
a number of free parameters and the computation of homology for a large num¬ 
ber of simplices can be expensive. This paper is a study of how to avoid a full 
(diffeomorphic) reconstruction and how to decrease the computational burden. 
Using trajectories from the classic Lorenz system, we reconstruct the dynamics 
using the method of delays, then build a parsimonious simplicial complex—the 
“witness complex”—to compute its homology. Surprisingly, we find that the 
witness complex correctly resolves the homology of the underlying invariant set 
from noisy samples of that set even if the reconstruction dimension is well be¬ 
low the thresholds specified in the embedding theorems for assuring topological 
conjugacy between the true and reconstructed dynamics. We conjecture that 
this is because the requirements for reconstructing homology, are less stringent 
than those in the embedding theorems. In particular, to faithfully reconstruct 
the homology, a homeomorphism is sufficient—as opposed to a diffeomorphism, 
as is necessary for the full dynamics. We provide preliminary evidence that a 
homeomorphism, in the form of a delay-coordinate reconstruction map, may 
manifest at a lower dimension than that required to achieve an embedding. 

Keywords: Topology, Delay-Coordinate Embedding, Nonlinear Time Series 
Analysis, Computational Homology, Witness Complex 


1. Introduction 

Topology is of particular interest in dynamics, since many properties—the 
existence of periodic orbits, transitivity, recurrence, entropy, etc.—depend only 
upon topology. This idea is commonly exploited in the computational topology 
community, often using the Conley index of isolating neighborhoods, to study 
dynamical invariants [1]. However, computing topology from time series can be 
a real challenge. First, one typically has only scalar data, not the full trajec¬ 
tory, and hence one must begin by reconstructing the full dynamics from that 
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data—e.g., via delay-coordinate reconstruction. Success of this reconstruction 
procedure depends on several free parameters. In practice, the embedding theo¬ 
rems provide little guidance regarding how to choose these parameters; though a 
number of creative strategies have been developed, these methods require good 
data and input from a human expert. Moreover, the delay-coordinate recon¬ 
struction machinery (both theorems and heuristics) targets the computation of 
dynamical invariants like the correlation dimension and the Lyapunov exponent. 
If one just wants to extract the topological structure of an invariant set, as we 
show in this paper, that full machinery may not be needed. Nevertheless, there 
are scales and scale parameter issues here, as for the standard machinery. More¬ 
over, real-world data sets have finite length, sample some underlying set at a 
finite time interval, have finite precision, and may be contaminated by noise. In 
the face of these issues, one obviously cannot compute the topology to arbitrary 
precision. 

Coarse-graining the topological analysis of data also addresses another issue: 
the associated computations are expensive, and that expense grows with the 
number of simplices in the complexes that one constructs during that process. 
The pioneering work in this area used cubical complexes and multivalued maps 
for this purpose [2], and these results can be computationally rigorous even in 
the face of noise. For more efficiency, one can use a simplicial complex that 
follows the natural geometry of the data—e.g., the witness complex of [3]. To 
construct a witness complex, one chooses a set of “landmarks,” typically a 
subset of the data, that become the vertices of the complex. The connections 
between the landmarks are determined by their nearness to the rest of the 
data—the “witnesses.” Two landmarks in the complex are joined by an edge, 
for instance, if they share at least one witness. As described in §2, there are many 
possible definitions for a witness “relation.” The one that we use includes a scale 
parameter, e, intended to provide a measure of noise immunity. The ideas of 
persistent homology [4, 5] can be used to choose e, build the complex, and then 
explore the changes in its topology with changing reconstruction dimension. 

Our initial work on this approach suggests that, for a number of different 
dynamical systems, the witness eomplex correctly resolves the homology of the 
underlying invariant set even if the reconstruction dimension is well below the 
thresholds for which the embedding theorems assure smooth conjugacy between 
the true and reconstructed dynamics. This paper reports upon an exploration 
of that conjecture in the context of the classic Lorenz system and suggests some 
implications and applications. To set the stage for that discussion, the rest of 
this section gives a brief review of delay-coordinate reconstruction. The witness 
complex is covered in more depth in §2, which also describes the notion of 
persistence and demonstrates how that idea is used to choose scale parameters 
for a complex built from reconstructed time-series data. In §3, we explore how 
the homology of such a complex changes with reconstruction dimension. 

Delay-coordinate reconstruction [6, 7] is arguably the most well-established 
technique for reconstructing the dynamics of a system from scalar time-series 
data. Suppose that F is a point on a compact invariant set M C and 
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Figure 1: Classic Lorenz attractor (r = 28, b = 8/3, a = 10): (a) A 10®-point 
trajectory in generated using fourth-order Runga-Kutta with a time step of 
T = 0.001. (b) A time-series trace of the x coordinate of that trajectory, (c) 
A 31? projection of a delay-coordinate embedding with dimension m = 5 and 
delay r = 174T, following (1). 


Y(t) represents its trajectory. A smooth measurement function : M —>■ R 
gives rise to a scalar time-series, x{t) = h{Y{t)), from that trajectory. Then the 
delay-coordinate map, F : M ^ R™ 

F(Y{t)-h,m,T) = {x{t), x{t-T), ..., x(t-(m-1 )t)) , (1) 

is almost always a diffeomorphism whenever r > 0 and if m is large enough, 
i.e., m > 2dbox, where dbox is the box-counting dimension of M [8]. When 
these conditions are met, the reconstructed attractor and the true attractor are 
diffeomorphic, and thus certainly have the same topology. The left panel of 
Fig. 1 shows an example: a trajectory from the classic Lorenz system [9]. The 
middle panel shows the corresponding time series of the x coordinate of that 
trajectory (i.e., h{x,y,z) = x), and the right panel shows a delay-coordinate 
reconstruction using t = 174T, where T is the interval between points in the 
time series. Note that a reconstruction dimension of five (m = 5) is required 
in order to satisfy the m > 2dbox requirement for this attractor, since dbox ~ 
2.06. Of course, it is not easy to display the 5D picture; Fig. 1(c) shows a 3D 
projection of this reconstruction. 

In practice, one is presented with a scalar time series so that the dimension 
d of the original state space is unknown, and one cannot compute dbox without 
first embedding the data. Thus, choosing the reconstruction dimension m is a 
challenge. There are a number of heuristics for doing so. Perhaps the most well- 
known is the family of false near-neighbor methods pioneered in [10]. The basic 
idea behind this class of methods is to increase the reconstruction dimension 
until the geometry of the neighbor relationships stabilizes; this is taken to indi¬ 
cate that any false crossings created by the measurement function h have been 
eliminated and the dynamics are properly unfolded. The choice of the delay r 
also plays a role in this unfolding. Though the theorems only require r > 0, 
in practice one needs to ensure that r is large enough to make the coordinates 
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numerically independent, but not so large that the coordinates become causally 
unrelated [11]. The standard approach for this—which we used to select the r 
value in Fig. 1(c)— is to calculate the time-delayed average mutual information 
of the time series and choose r at the first minimum of that curve [12]. There 
are many other methods for estimating both m and r; see [13] for a deeper dis¬ 
cussion. All of these procedures are subtle and subjective. Invoking them and 
interpreting their results requires good data and expert knowledge; the false- 
near neighbor method, for instance, has been shown to regularly overestimate 
embedding dimension when noise is present in the time series—something that 
is unavoidable in experimental data. 

In this paper, we adopt the philosophy that one might only desire knowledge 
of the topology of the invariant set, and we conjecture that this might be possible 
with a lower reconstruction dimension than that needed to obtain a true “em¬ 
bedding.” That is, the reconstructed dynamics might be homeomorphic to the 
original dynamics at a lower dimension than that needed for a diffeomorphically 
correct embedding. We will return to this idea below. 


2. Witness Complexes for Dynamical Systems 

To compute the topology of a data set that samples an invariant set of a 
dynamical system, we need a complex that captures the shape of the data, but 
is robust with respect to noise and other sampling issues. To do so efficiently, 
the complex should be parsimonious. A witness complex is an ideal choice 
for these purposes. Such a complex is determined by the reconstructed time- 
series data, W C R™—the witnesses —and an associated set L C R'”, the 
landmarks, which can (but need not) be chosen from among the witnesses. The 
landmarks form the vertex set of the complex; the connections between them 
are dictated by the geometric relationships between W and L. In a general 
sense, a witness complex can be defined through a relation R{W, L) C W x L. 
As Dowker noted [14], any relation gives rise to a pair of simplicial complexes. 
We will use one: a point w S IT is a witness to an abstract fc-dimensional 
simplex a = {/q ,li^,... } C L whenever {w} x cr C R[W, L). The collection 

of simplices that have witnesses is a complex relative to the relation R. For 
example, two landmarks are connected if they have a common witness—this is 
a one-simplex. Similarly, if three landmarks have a common witness, they form 
a two-simplex, and so on. 

There are many possible definitions for a witness relation R. One very 
natural construction is to use the matrix D{W, L) of distances D^j = \\wi — lj\\ 
to define R. Sorting each row of this matrix from smallest to largest determines 
the set of landmarks that are closest to the witness. A relation corresponds 
to assigning a cut-off, which thereby determines the simplices witnessed by Wi. 
For example one can choose a fixed number (viz., fc-nearest neighbors), a strict 
size (neighbors within some distance), or an increment. The first concept gives 
the “weak witness complex” of de Silva and Carlsson [3], but suffers from the 
problem that there is no limit on the distance to the nearest neighbors and thus 
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a simplex might be too spread out. The second notion seems too restrictive: 
a portion of the invariant set M that has a low density may not be covered 
enough to be represented in the complex. The third idea is a compromise and 
gives the notion of an e-weak witness [15], or what we call a “fuzzy” witness 
[16]: a point witnesses a simplex if all the landmarks in that simplex are within 
e of the closest landmark to the witness: 

Definition (Fuzzy Witness). The fuzzy witness set for a point I £ L is the set 
of witnesses 


Weil) = {w & W :\\w — /]] < min jjw — l'\\ + e}. 


( 2 ) 


In this case the relation consists of the collections R = Uigi(We(l) x {1}), and a 
simplex cr is in the complex whenever (li^aWeil) 7 ^ 0 , that is, all of its vertices 
share a witness. 

The fuzzy witness complex reduces to “strong witness complex” of de Silva 
and Carlsson [3] when e = 0. In such a complex, an edge exists between two 
landmarks iff there exists a witness that is exactly equidistant from those land¬ 
marks. This is not a practical notion of shared closeness for finite data sets. A 
simpler implementation of the fuzzy witness complex gives a “clique” or “flag” 
complex, analogous to the Rips complex [17], that consists of simplices whose 
pairs of vertices have a common witness (this is called a “lazy” complex in [3] ). 
In this case the complex is 


JCeiW, L) = {aCL: Weil) n Weil') 7 ^ 0, VZ, I' G a} 


(3) 


Following [16], we will use this particular construction because it minimizes 
computational complexity. 

Figure 2 shows four witness complexes built in this fashion from the 100,000- 
point trajectory of the Lorenz system that is shown in Fig. 1(a). The landmarks 
(red dots) consist of £ = 201 points equally spaced along the trajectory, i.e., 
every At = 500*^ point of the time series. When e is small, very few witnesses 
fall in the thin regions required by ( 2 ), so the resulting complex does not have 
many edges and is thus not a good representation of the shape of the data. 
As e grows, more witnesses fall in the “shared” regions and the complex fills 
in, revealing the basic homology of the attractor of which the trajectory is a 
sample. There is an obvious limit to this, however: when e is very large, the 
even the largest holes in the complex are obscured. 

Studying the change in homology under changing scale parameters is a well- 
established notion in computational topology. The underlying idea of persis¬ 
tence [4, 5, 18] is that any topological property of physical interest should be 
(relatively) independent of parameter choices in the associated algorithms. One 
useful way to represent information about the changing topology of a complex is 
the barcode persistence diagram [17]. Fig. 3 shows barcodes of the first two Betti 
numbers for the witness complexes of Fig. 2. Each horizontal line in the barcode 
is the interval in e for which there exists a particular non-bounding cycle, thus 
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Figure 2: Varying the fuzziness parameter e: One-skeletons of clique complexes 
lCe{W, L) constructed from the trajectory of Fig. 1(a) using 201 landmarks (red 
dots) and four values of e. 


the number of such lines is the rank of the homology group—a Betti number. 
We computed these values for j3o and Pi using javaPlex [19] over the range 
0.017 < e < 1.7, using the “explicit” landmark selector to choose the equally 
spaced points and the “lazy” witness stream to obtain a clique complex from 
the i = 201 landmarks. There were no three-dimensional voids, i.e., P2 was al¬ 
ways zero for this range of e —a reasonable implication for the 2.06-dimensional 
attractor. When e: is very small, as in Fig. 2(a), the witness complex has many 
components and the /3o barcode shows a large number of entries. As e grows, 
the spurious gaps between these components disappear, leaving a single compo¬ 
nent that persists above e « 0.014. That is, witness complexes constructed with 
e > 0.014 correctly capture the connectedness of the underlying attractor. The 
Pi barcode plot shows a similar pattern: there are many holes for small e that 
are successively hlled in as that parameter grows, leaving the two main holes 
(i.e., Pi = 2) for e > 1.01. Above e > 3.2 (not shown in Fig. 3), one of those 
holes disappears; eventually, for e > 4.05, the complex becomes topologically 
trivial. Above this value, the resulting complexes—recall Fig. 2(d) —have no 
non-contractible loops and are homologous to a point (acyclic). 

This notion of persistence can be turned around and used to select good 
values for the parameters that play a role in topological data analysis—i.e., 
looking for the e value at which the homology stabilizes. However, definitions 
of what constitutes stabilization are subjective and can be problematic. This 
kind of issue turns up routinely in nonlinear time-series analysis [13]. Even so. 
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Figure 3: Persistence barcodes computed using javaPlex for a £ = 201 witness 
complex of the trajectory of Fig. 1(a). Each plot tabulates the two lowest Betti 
numbers of the complex for 100 values of the scale parameter e. The left panel 
shows the behavior when 0.001 < e < 0.1, and the right 0.017 < e < 1.7. 


persistence is a powerful technique and we make use of it in a number of ways 
here. 

Another critical step in our approach is the selection of the landmarks that 
constitute the vertex set of the witness complex. For efficiency, the number of 
landmarks should be much smaller than the number of points in the time series, 
but for efficacy they should be distributed so as to capture the shape of the data. 
A simple method for this, advocated by [3], is to use a max-min algorithm that 
chooses L C W hy selecting the farthest point in W from the previous selection 
and iterating until desired density and sparseness requirements are satisfied, 
if possible. For data from a dynamical system, one can alternatively exploit 
the natural temporal ordering and select points that are equally spaced in time 
(At) along the trajectory. If the attractor is ergodic, this will distribute the 
landmarks evenly relative its invariant measure. One advantage of this strategy 
in the context of delay-coordinate reconstruction, as discussed in §3 below, is 
that it allows both witnesses and landmarks to be consistently moved from one 
embedding dimension to another. Thus we adopt this approach here. Needless 
to say, the choice of the time interval between landmarks invokes an tradeoff 
between accuracy and computational efficiency. 

In practice, we use persistence to select the number of landmarks: i.e., given 
a trajectory, we build the complex for different £ values, calculate the homology, 
and choose a value at which the results stabilize. In our experience, the homol¬ 
ogy of the witness complex appears to be highly robust to the landmark spacing, 
but that issue will require more exploration, as described in §4. Fig. 4 shows a 
series of examples: fuzzy witness complexes constructed from the trajectory of 
Fig. 1(a) increasing i from 26 to 5001. As one would expect, the complexes fill 
in as the number of landmarks increases. Visually, the two main holes become 
apparent in the i = 101 complex, which begs the question: does one really need 
the extra structure of an £ ^ 100 complex if the goal is to resolve the large-scale 
topology of the attractor? 
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Figure 4: Varying the number of landmarks t. One-skeletons of witness com¬ 
plexes (3) constructed from the trajectory of Fig. 1(a) with e = 1.2 and six 
values of i. 


The number of landmarks required for this will obviously depend, in a com¬ 
plex way, on the structure of the underlying invariant set. In the simplest 
case—if this set were a Riemannian submanifold—the results of [20] imply that, 
if the “feature size” of the manifold is not too small, a well-defined number 
of sample points are needed to reconstruct the topology of the manifold—with 
high probability— from a Cech complex with a given ball size. It is also pos¬ 
sible to guarantee that the witness complex has the same topology as an alpha 
complex, under appropriate conditions on the density of the sets L and W and 
with appropriate selection of a and e [16, 21]. 

For the case at hand, however, the invariant set is a fractal, and it is impos¬ 
sible to obtain all of its structure from a finite sample, even though we expect 
to resolve more of the complexity with more landmarks. The witness complex 
in Fig. 4(f), for instance, resolves some of the spiral-shaped gaps in the wings of 
the attractor. For a fixed data set, however, there is a fundamental limit: it does 
not make sense to think about resolving the fine-grained structure of an object 
beyond the limits that are inherent in a finite-length, finite-precision sampling 
of that object. Since all data have these limitations, and all real-world data 
are noisy, it makes sense to content ourselves with an approximate topology. In 
this case, the roughest topology corresponds to the Williams branched manifold 
model of the Lorenz attractor [22], which has the homology of a figure-eight, 
i.e. /3o = 1 and /3i = 2. This topology corresponds to that implied by longest 
bars in the persistence diagram of Fig. 3(b). 

The witness complex elegantly balances effectiveness and efficiency in topo¬ 
logical data analysis. Its use of a small number of landmarks sidesteps the 
issues that arise when one builds a fine-grained complex that touches every 
data point, which is computationally expensive and noise-sensitive. One could 
also use cubical complexes to address those issues, as in [2], but the witness 
complex is computationally less demanding, both because it naturally follows 
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the data and because of the “tight” way that a simplicial complex covers a 
space. Among other things, the dimension of each simplex in the clique com¬ 
plex can be restricted to be just high enough to cover the corresponding part of 
the invariant set, whereas all of the grid elements in the cubical case necessar¬ 
ily have the dimension of the ambient space. This means that computational 
homology algorithms like javaPlex not only have fewer cells to process in the 
simplicial case, but also far fewer neighbors to check—e.g., during computations 
of isolating neighborhoods. See [16] [Appendix C] for a detailed analysis of the 
associated computational costs. Witness complexes have begun to see some use 
in topological data analysis [15], but they are not completely immune to the 
foibles of real-world data. Complexes constructed using the standard witness 
relations, for example, can contain “false positive” edges due to distant wit¬ 
nesses (in the case of the weak relation) and “false negative” edges because of 
the strong relation’s very stringent requirement on shared witnesses, which is 
senseless in noisy, incompletely sampled data. The fuzzy witness relation used 
here is intended to mitigate these issues. Of course, this approach is not without 
disadvantages. Both the landmark spacing At that makes witness complexes 
computationally efficient and the e that makes the fuzzy witness complex robust 
with respect to noise are parameters that one needs to choose, and choose well. 
Moreover, these choices interact, as described in the following section. 

The examples presented in this section involve a full trajectory from a dy¬ 
namical system. In the real world, however, one is generally working with 
reconstructions of scalar time-series data—structures whose topology is guar¬ 
anteed to be identical to that of the underlying dynamics if the reconstruction 
process is carried out properly. But what if the dimension m does not satisfy 
the requirements of the theorems? Can one obtain useful results about the 
topology of that underlying system using the ideas, even if those dynamics are 
not properly unfolded in the sense of [6, 7, 8]? It is to this issue that we turn 
next. 


3. Topologies of reconstructions 

Scalar time-series data from a dynamical system are a projection of the d- 
dimensional dynamics onto K.^—an action that does not automatically preserve 
the topology of the object. The method of delays allows one to reconstruct the 
underlying dynamics, up to diffeomorphism, if the reconstruction dimension is 
large enough. There are a number of conditions for the successful execution of 
this procedure, as mentioned in §1. According to [7], seven dimensions (to > 2d) 
are almost always sufficient to reconstruct the structure of the Lorenz attractor 
from the time series of Fig. 1(b); the looser bounds of [8], however, suggest 
that TO = 5 is sufficient, since the box-counting dimension of that attractor is 
2.06. Since the state-space dimension is generally unknown and one needs an 
embedding to compute dbox, choosing to is a primary challenge in nonlinear 
time-series analysis. The question we wish to address in this section is: can one 
use the witness complex to obtain a useful, coarse-grained description of the 
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topology from lower-dimensional reconstructions—say, the basic connectivity 
or number of holes in an attractor that are larger than a certain scale? 

The short answer to that question is yes. Figure 5 shows a side-by-side 
comparison of witness complexes and barcode diagrams for the Lorenz trajectory 
of Fig. 1(a) and a two-dimensional reconstruction (m = 2) using the x coordinate 
of that trajectory. For the full ZD trajectory on the left, javaPlex needed 6942 
simplices to resolve the two main holes in the attractor, with e = 1.2. For the 
2D reconstruction on the right—constructed with r = 0.174, the first minimum 
of the time-delayed mutual information—the witness complex with e = 0.2 has 
only 1916 simplices but has the same homology as the ZD complex.^ In other 
words, the correct large-scale homology is accessible from a witness complex of a 
2D reeonstruction, in a eomputationally efficient manner, even though (a) that 
complex does not involve all of the data points and (b) the reconstruction does 
not satisfy the conditions of the associated theorems. 

And that leads to the central question of this paper: how does the topology 
of the witness complex change with the reconstruction dimension m? Intuitively, 
one would think that the homology of the witness complex would change until 
m became large enough to correctly unfold the topology of the underlying at¬ 
tractor, and then stabilize. In practice, however, if m is too large, the so-called 
“curse of dimensionality,” when a finite amount of data is spread over a large 
volume, will destroy the fidelity of the complex. Additionally, the effect of noise 
will be amplihed and the computational expense will grow with m. For all of 
these reasons, it would be a major advantage if one could obtain useful infor¬ 
mation about the homology of the underlying attractor from a low-dimensional 
delay-coordinate reconstruction of scalar time-series data. 

Again, it appears that this is possible. Figure 6 shows witness complexes for 
m = 2 and m = 3 reconstructions of the Lorenz time series of Fig. 1(b). The 
barcodes for first two Betti numbers for these two complexes, as computed using 
javaPlex, have similar structure: the complexes become connected (/3o = 1) at 
a small value of £, and the dominant, persistent homology corresponds to the 
two primary holes (/3i = 2) in the attractor. Note, by the way, that Fig. 6(a) is 
not simply a 2D projection of Fig. 6(b); the edges in each complex reflect the 
geometry of the witness relationships in different spaces, and so may differ. (The 
landmark set in the 2D reconstruction is a projection of the landmark set in 
the ZD reconstruction, however, because we apply the same “lifting” operation 
to all points.) Higher-dimensional reconstructions—not easily displayed—have 
the same homology for suitable choices of e, though for to > 5, it is necessary 
to increase the number of landmarks to obtain a persistent /3i = 2. 

That brings up an important point: if one wants to sensibly compare witness 
complexes constructed from different reconstructions of a single data set, one 


^ The difference between the e values that yields a persistent result in these two cases makes 
sense because the diameters of the true and reconstructed attractors are different. This issue 
is discussed further below. 
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Figure 5: One-skeletons of the witness complexes (top row) and barcode dia¬ 
grams for /3i (bottom row) of the Lorenz system. The plots in the left-hand col¬ 
umn were computed from the three-dimensional (x, y, z) trajectory of Fig. 1(a); 
those in the right-hand column were computed from a two-dimensional (m = 2) 
delay-coordinate reconstruction from the x coordinate of that trajectory with 
r = 174T. In both cases, £ = 201 equally spaced landmarks (red xs) were used. 
Both complexes have two persistent nonbounding cycles (green and blue edges) 
but the 2D reconstruction requires only Ri 1900 simplices to resolve those cycles 
(at e = 0.2), while the full iD trajectory requires r; 7000 simplices (at e = 1.2) 
to eliminate spurious loops. 


has to think carefully about the (. and e parameters. Here, we used persistence 
to choose a good value of 1. We found that the results were robust with respect 
to changes in that value, across all reconstruction dimension values in this study, 
so we fix £ R! 200 for all the experiments reported here.^ Since the number of 
data points required to properly sample an object should generally grow with 
dimension [23], this will require more exploration, as mentioned in §4. 

In the experiments in the previous section, the scale parameter e was given in 
absolute units. To generalize this approach across different examples, it makes 
sense to compare reconstructions with e chosen to be a fixed fraction, 

e = ^ diam(kF) (4) 


^The precise value varies slightly because the length of a trajectory reconstructed from a 
fixed-length data set decreases with increasing m (since one needs a full span of m X {r/T) 
data points to construct a point in the reconstruction space). 
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Figure 6: The effect of reconstruction dimension: One-skeletons of witness 
complexes of different reconstructions of the scalar time series of Fig. 1(b). 
Both reconstructions use r = 0.174, the first minimum of the average time- 
delayed mutual information[12], £ = 198 equally spaced landmarks (red dots), 
and ^ = 0.54%, as defined in (4). 


of the diameter, diam(VF), of the set W. For example, for the full 3D attractor 
in Fig. 1(a), 


diam(Wa;yz) 


\/(a 


1 r + (y max ymin') {_^max ^min) 


75.3, 


so the e values used in Fig. 3 —0.017 < e < 1.7 in absolute units—translate to 
2.3 X 10“"^ < C < 0.023 in this diameter-scaled measure. 

The diameter of the reconstruction varies in a natural way with the di¬ 
mension m. Since delay-coordinate reconstruction of scalar data unfolds the 
full range of those data along every added dimension, the diameter of an m- 
dimensional reconstruction will be 


diam(lF772) — \/ m(Xmax ^mm)^ 


37.0-\/m, 


where x represents the scalar time-series data. Since this unfolding will change 
the geometry of the reconstruction, we need to scale e accordingly. The witness 
complexes in Fig. 6 were constructed with a fixed value of ^ = 0.54%. Thus, 
for Fig. 6(a), e = 37.0-\/2(0.0054) = 0.283 in absolute units, while for Fig. 1(b), 
diam(lF 3 ) = 37.0-\/3 and e = 0.346. Together with the even temporal spac¬ 
ing of landmarks, this scaling of e —which is used throughout the rest of this 
section—should allow the witness complex to adapt appropriately to the effects 
of changing reconstruction dimension and finite data. 

To formalize the exploration of the reconstruction homology and extend that 
study across multiple dimensions, one can use a variant of the classic barcode 
diagram that shows, for each simplex, the reconstruction dimension values at 
which it appears in and vanishes from the complex. Fig. 7(a) shows such a plot 
for edges that involve the first landmark on the reconstructed trajectory. A 
number of interesting features are apparent in this image. Unsurprisingly, most 
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of the one-simplices that exist in the m = 1 witness complex—many of which are 
likely due to the strong effects of the projection of the underlying trajectory 
onto R^—vanish when one moves to m = 2. There are other short-lived edges 
in the complex as well: e.g., the edge from Iq to Z 120 that is born at m = 2 and 
dies at m = 3. The sketch in Fig. 7(b) demonstrates how edges can be born as 
m increases: in m = 2 , ii and £3 share a witness (the green square); when one 
moves to m = 3, spreading all of the points out along the added dimension, that 
witness is moved far from £ 3 —and into the shared region between £1 and £ 2 - 
There are also long-lived edges in the complex of Fig. 7(a). The one between Iq 
and /i 4 o that persists from m = 1 to to = 8 is particularly interesting: this pair 
of landmarks has shared witnesses in the scalar data and in all reconstructions. 
Possible causes for this are explored in more depth below. All of these effects 
depend on of course; decreasing ^ will decrease both the number and average 
length of the edge persistence bars. 



Figure 7: (a) Dimension barcode for edges in the witness complex of the re¬ 
constructed scalar time series of Fig. 1(b) that involve Iq, the first landmark, 
for reconstructions with to = 1,...,8. The vertical axis is labeled with the 
indices of the remaining 197 landmarks in the complex; a (green) circle at the 
TO — 1 TO tickmark on the horizontal axis indicates the transition at which an 
edge between Iq and li is born; a (red) square indicates the transition at which 
that edge vanishes from the complex. A (blue) diamond at the right-hand edge 
of the plot indicates an edge that was still stable when the algorithm completed. 
For all reconstructions, r = 0.174, £ = 198, and ^ = 0.54%. (b) Sketch of the 
birth of an edge at the to = 2 —>■ 3 transition. 

While this Am barcode image is interesting, the amount of detail that it 
contains makes it somewhat unwieldy. To study the m-persistence of all of £ x £ 
edges in a witness complex, one would need to examine £ of these plots—or 
condense them into a single plot with £^ entries on the vertical axis. Instead, 
one can plot what we call an edge lifespan diagram: an £ x £ matrix whose 
{i,jy^ pixel is colored according to the maximum range of to for which an edge 
exists in the complex between the and landmarks; see Fig. 8. If the edge 
{li, Ij} existed in the complex for 2 < to < 3 and 5 < to < 8 , for instance, Am 
would be three and the i,j*^ pixel would be coded in cyan. Edges that do not 
exist for any dimension are coded white. 

A prominent feature of Fig. 8 is a large number (683) of edges with a lifespan 
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Figure 8: Edge lifespan diagram: pixel J on this image is color-coded according 
to the maximum range Am of dimension for which an edge exists between 
landmarks U and Ij in the witness complex of the reconstructed scalar time 
series of Fig. 1(b) for m = 1,... ,8. For all reconstructions, r = 174, £ = 198, 
and C = 0.54%. 


1 (blue). Of these edges, 463 exist for m = 1, but not for m = 2, and thus reflect 
the anomalous behavior of projecting a 2.06 dimensional object onto a line. This 
was also seen, as described above, in the barcode of Fig. 7. 

Another feature observed in the lifespan diagram is a number of diagonal line 
segments. Note that the color of the pixels in the segments varies, though most 
of these segments correspond to edges with longer lifespans. These segments in¬ 
dicate the existence of Am-persistent edges {k, Ij}, {k+i, Ij+i}, {h+ 2 , lj+ 2 } ■ ■ •• 
This is likely due to the continuity of the dynamics [24]. Recall that the land¬ 
marks are evenly spaced in time, so h+i is the At-forward image of h. Thus 
a diagonal segment may indicate that the At-forward images of (at least one) 
witness that is shared between li and Ij is shared between and Ij+i, and 
so on. The lengths of the longer line segments suggest that that continuity 
fails after 5-10 At steps, probably because of the positive Lyapunov exponents 
on the attractor. A simple check on this reasoning, would be an edge lifespan 
diagram for a dynamical system with a limit cycle. The structure of this plot 
(not shown) is dominated by diagonal lines of high Am-persistence, with a few 
other scattered one-persistent edges. One can capture the underlying dynami¬ 
cal information that gives rise to these effects more fully using what we call the 
witness map, as mentioned briefly in §4 and described at more length in [16]. 

The rationale behind studying the maximal m-lifespan goes back to one of 
the basic premises of persistence: that features that persist for a wide range of 
parameter values are in some sense meaningful. To explore this. Fig. 9 shows 
the witness complex of Fig. 6(a) with the Am > 2-persistent edges drawn as 
thicker lines: that is, edges that exist at m = 2 and persist at least to m = 4. 
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Figure 9: Witness complex of Fig. 6(a) with Am > 2-persistent edges shown as 
thick (black) lines, and the Am = 1 edges as (red) dashed lines. 




There exists a fundamental core to the complex that persists as the dimension 
grows and thus is robust to geometric distortion, but there are also short-lived 
edges that fill in the complex in accord with the local geometric structure of 
the reconstruction. Indeed, when m = 2, the projection artificially compresses 
near the origin; small simplicies fill in this region due to the landmark clustering 
there. However, in the transition to m = 3—viz., Fig. 6(b)— this region stretches 
away from the origin, spreading the landmarks out. There is a similar cluster 
of fragile edges near the lower left corner of the complex. 

This geometric evolution with increasing reconstruction dimension leads to 
the death of many local edges. Even so, the large-scale topology is correct in 
both complexes of Fig. 6, although the fine-scale topology is resolved differently 
by the dimension-dependent geometry. So while the edges with longer lifespan 
are indeed more important to the core structure, the short-lived edges are also 
important because they allow the complex to adapt to the geometric evolution 
of the attractor and fill in the details of the skeleton that are necessary and 
meaningful in that dimension. 

In the spirit of the false near-neighbor method [10], one might be tempted to 
take the short-lived edges as an indication that the reconstruction dimension is 
inadequate. However, one computes homology from the overall complex. As the 
example above shows, homology is relatively robust with respect to individual 
edges. The moral of this story is that the lifespan of an edge is not necessarily 
an obvious indication of its importance to the homology of the complex; Am- 
persistence plays a different role here than the abscissa of traditional barcode 
persistence plots. 

A closely related issue is noise, which is always present in real-world data 
and can disturb the geometric relationships between points in the complex. To 
study the effect on the fuzzy witness complex, we add uniformly distributed 
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Figure 10; The effect of noise: One-skeletons and barcodes of witness complexes 
for m = 2, ^ = 0.41%, t = 174r, £ = 201 reconstructions of the scalar time 
series of Fig. 1(b) with added uniform noise of size ^ = 1 (a and c) and v = A 
(b and d)—respectively, 1.9% and 7.6% of the diameter of the reconstructed 
attractor. 


noise on the interval [—i//2,z//2] to each point of the trajectory of Fig. 1(b), 
and then perform a delay-coordinate reconstruction using m = 2. Visually, 
Fig. 10(a), where iz = 1, shows a surprisingly similar complex to the noise- 
free case of Fig. 5(b); however, the structure is clearly different in Fig. 10(b), 
where = 4. Comparing the barcodes, one sees that the fine-scale structure is, 
unsurprisingly, washed out for smaller values of e than in the noise-free case. 
In particular, the persistence intervals of small-scale loops are decreased. It is 
encouraging, however, to see that the two major holes persist over a wide range 
of e even when the noise level is close to 2% of the diameter of the attractor. 
The larger noise level (7.6% of that diameter) is enough to destroy the two large 
loops so that the complex becomes acyclic when e > 0.152. 

The relative immunity to noise is a general feature of persistent homology for 
point cloud data [17]. The additional robustness of the witness complex has two 
sources: the fuzziness parameter and the fact that multiple points can witness 
a given simplex. Even if noise moves one witness out of the shared region that 
is defined by e, it may not move all of the witnesses to a particular simplex 
out of that shared region. This suggests a possible noise mitigation technique: 
build a complex that only contains simplices that have at least n witnesses—and 
perhaps explore n-persistence of the associated homology. We plan to explore 
these ideas in future work. 

Assessing the change in topology with changing reconstruction dimension is 
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a new flavor of persistence—an idea that has traditionally been applied in the 
context of scale parameters like our fuzziness parameter e. Recall that persistent 
homology is based on the idea of a filtration, i.e., a nested set of complexes. 
Most of the standard simplicial complex constructions for data have a natural 
parameter that gives rise to a filtration; for example, the radius for a sequence 
of Cech complexes. This filtration can be used to define persistent homology 
[18] and identify robust topological features—e.g., those that correspond to 
long-lived bars in a barcode diagram. Since edges of the witness complex are 
both created and destroyed with increasing reconstruction dimension, the idea 
of persistence would not seem to work for this case (except perhaps in the sense 
of zigzag persistence [25]). However, if we use Am-persistent edges—edges that 
exist for some range of m—we can get a filtration. If L'*(Am) is the set of edges 
that exist for a range Am, then L®(Am) C L®(Am— 1). The same would be true 
for a (clique) complex built from these edges. Note that this gives, somewhat 
surprisingly, an inclusion as Am decreases, the reverse of what one might think 
at first. 


4. Conclusion & Future Work 

We have shown that it is possible compute the topology of an invariant 
set of a dynamical system using a coarse-grained simplicial complex—a wit¬ 
ness complex —built from a low-dimensional reconstruction of a scalar time se¬ 
ries. These results have a number of interesting implications. Among other 
things, they suggest that the traditional delay-coordinate reconstruction pro¬ 
cess is excessive if one is only interested in topological structure. Indeed, 
this explains why it is possible to construct accurate predictions of the future 
state of a high-dimensional dynamical system using a two-dimensional delay- 
coordinate reconstruction [26]. The delay-coordinate machinery strives to obtain 
a diffeomorphism—not a homeomorphism—between the true and reconstructed 
attractors. However, many of the important properties of attractors (transitiv¬ 
ity, continuity, recurrence, entropy, etc.) are topological, so requiring only a 
homeomorphism is natural and more efficient [2]. 

Given a finite set of data, and finite computational power, one can never 
compute the full topology, of course. Nevertheless, it is useful to obtain a 
coarse approximation of the topological features—e.g., the two main holes in the 
canonical Lorenz attractor. Moreover, the scale of the exploration is controlled 
by the spacing between the landmarks in the witness complex, so finer structure 
can be observed if needed. There is a computational cost, of course, even with 
the natural parsimony of the witness complex. 

To choose the various free parameters in this approach, we used ideas from 
persistent homology to investigate the dependence of the complex on the num¬ 
ber of landmarks I and on the fuzziness parameter e. It might be possible to 
make this multi-parameter exploration of persistence rigorous using ideas re¬ 
lated to those of [27]. To study persistence across reconstruction dimension we 
introduced the maximum lifespan Am. This parameter gives a filtration so that 
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the standard persistent homology methods apply. We found that the long-lived 
edges determine the core of the complex and that the shorter-lived edges fill in 
the fine-grained structure. Of course, some edges are superfluous for the ho¬ 
mology, and destroying them does not change the Betti numbers. This is part 
of the strength of the witness complex approach, as well as the noise immunity 
that we have observed. 

In this paper we have used a fuzzy witness relation based on (2) and built a 
clique complex (3). Since the structure of such a complex is determined by its 
edges, it is computationally efficient. In the future we plan to investigate other 
choices for the witness relation, e.g. [3, 28], and to see if removing the clique 
assumption allows for greater fidelity without excess computational burden. It 
would be also be interesting to investigate how the robustness with respect to 
noise varies with the choice of relation. 

Our approach not only provides a strategy for selecting a reconstruction 
dimension that reveals the (approximate) homology of the attractor; it is also a 
step along the path to detecting and characterizing bifurcations [29]. Suppose 
that, for example, the data corresponds trivially to an equilibrium, that is to a 
set with /3fe = 0 for all fc > 0, but that this equilibrium undergoes a bifurcation to 
an oscillatory regime. In this case, a shift in Pi signals a regime change. Regime 
changes in a nonstationary data set could also be due dynamical parameters 
changing with time, or to switching from one dynamical system to another, 
e.g., such as an iterated function system [24]. 

Further analysis of the dynamics, beyond the static homology of the points in 
the reconstructed trajectory, requires the construction of a map on the witness 
complex that is induced by the temporal shift on the time series. Following the 
ideas of [16], the time-ordering of the data gives rise to a simplicial multi-valued 
map, that we call the witness map, and under certain conditions this map can 
be shown to induce a map on homology that is the same as any continuous 
“selector” and allows one to compute the Conley index which can be used to 
prove the existence of various invariant sets. 

This paper uses a single example—the classic Lorenz system—but we have 
observed similar results in other dynamical systems. We plan in the future to 
do a careful exploration of additional systems, both maps and flows, to explore 
the effects of the size and spacing of the data, to study the interaction with 
the fuzziness parameter e and the number i of landmarks. The edge-length 
distribution might be a useful aid in choosing good values for the former; for 
the latter, it may be useful to examine the distribution of witness-landmark 
distances. The underlying problem is not simple; the data sample the invariant 
set, the landmarks sample the data, and the complex reflects the geometric 
relationships between the landmarks and the rest of the data. One could get 
at some of this by changing the data length—say, using half of the time-series 
data—repeating the analysis, and looking to see if the “best” parameter values 
change. It will be important to determine guidelines for the optimal number 
and distribution of the landmarks, and how this should scale with m for a fixed 
trajectory. 
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